37 research outputs found

    Online Learning with Switching Costs and Other Adaptive Adversaries

    Full text link
    We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we characterize ---in a nearly complete manner--- the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switching costs, the attainable rate with bandit feedback is Θ~(T2/3)\widetilde{\Theta}(T^{2/3}). Interestingly, this rate is significantly worse than the Θ(T)\Theta(\sqrt{T}) rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also show that a bounded memory adversary can force Θ~(T2/3)\widetilde{\Theta}(T^{2/3}) regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies

    On prediction of individual sequences

    Get PDF
    Sequential randomized prediction of an arbitrary binary sequence is investigated. No assumption is made on the mechanism of generating the bit sequence. The goal of the predictor is to minimize its relative loss, i.e., to make (almost) as few mistakes as the best ``expert'' in a fixed, possibly infinite, set of experts. We point out a surprising connection between this prediction problem and empirical process theory. First, in the special case of static (memoryless) experts, we completely characterize the minimax relative loss in terms of the maximum of an associated Rademacher process. Then we show general upper and lower bounds on the minimax relative loss in terms of the geometry of the class of experts. As main examples, we determine the exact order of magnitude of the minimax relative loss for the class of autoregressive linear predictors and for the class of Markov experts.Universal prediction, prediction with experts, absolute loss, empirical processes, covering numbers, finite-state machines

    Delay and Cooperation in Nonstochastic Bandits

    Get PDF
    We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than dd hops to arrive, where dd is a delay parameter. We introduce \textsc{Exp3-Coop}, a cooperative version of the {\sc Exp3} algorithm and prove that with KK actions and NN agents the average per-agent regret after TT rounds is at most of order (d+1+KNα≀d)(Tln⁥K)\sqrt{\bigl(d+1 + \tfrac{K}{N}\alpha_{\le d}\bigr)(T\ln K)}, where α≀d\alpha_{\le d} is the independence number of the dd-th power of the connected communication graph GG. We then show that for any connected graph, for d=Kd=\sqrt{K} the regret bound is K1/4TK^{1/4}\sqrt{T}, strictly better than the minimax regret KT\sqrt{KT} for noncooperating agents. More informed choices of dd lead to bounds which are arbitrarily close to the full information minimax regret Tln⁥K\sqrt{T\ln K} when GG is dense. When GG has sparse components, we show that a variant of \textsc{Exp3-Coop}, allowing agents to choose their parameters according to their centrality in GG, strictly improves the regret. Finally, as a by-product of our analysis, we provide the first characterization of the minimax regret for bandit learning with delay.Comment: 30 page

    Adaptive maximization of social welfare

    Full text link
    We consider the problem of repeatedly choosing policies to maximize social welfare. Welfare is a weighted sum of private utility and public revenue. Earlier outcomes inform later policies. Utility is not observed, but indirectly inferred. Response functions are learned through experimentation. We derive a lower bound on regret, and a matching adversarial upper bound for a variant of the Exp3 algorithm. Cumulative regret grows at a rate of T2/3T^{2/3}. This implies that (i) welfare maximization is harder than the multi-armed bandit problem (with a rate of T1/2T^{1/2} for finite policy sets), and (ii) our algorithm achieves the optimal rate. For the stochastic setting, if social welfare is concave, we can achieve a rate of T1/2T^{1/2} (for continuous policy sets), using a dyadic search algorithm. We analyze an extension to nonlinear income taxation, and sketch an extension to commodity taxation. We compare our setting to monopoly pricing (which is easier), and price setting for bilateral trade (which is harder)

    Nonstochastic Bandits with Composite Anonymous Feedback

    Get PDF
    International audienceWe investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over at most d consecutive steps in an adversarial way. This implies that the instantaneous loss observed by the player at the end of each round is a sum of as many as d loss components of previously played actions. Hence, unlike the standard bandit setting with delayed feedback, here the player cannot observe the individual delayed losses, but only their sum. Our main contribution is a general reduction transforming a standard bandit algorithm into one that can operate in this harder setting. We also show how the regret of the transformed algorithm can be bounded in terms of the regret of the original algorithm. Our reduction cannot be improved in general: we prove a lower bound on the regret of any bandit algorithm in this setting that matches (up to log factors) the upper bound obtained via our reduction. Finally, we show how our reduction can be extended to more complex bandit settings, such as combinatorial linear bandits and online bandit convex optimization

    Correlation Clustering with Adaptive Similarity Queries

    Get PDF
    International audienceIn correlation clustering, we are given nn objects together with a binary similarity score between each pair of them. The goal is to partition the objects into clusters so to minimise the disagreements with the scores. In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries. On the one hand, we describe simple active learning algorithms, which provably achieve an almost optimal trade-off while giving cluster recovery guarantees, and we test them on different datasets. On the other hand, we prove information-theoretical bounds on the number of queries necessary to guarantee a prescribed disagreement bound. These results give a rich characterization of the trade-off between queries and clustering error

    The future of Cybersecurity in Italy: Strategic focus area

    Get PDF
    corecore